Goto

Collaborating Authors

 ecg image


Pic2Diagnosis: A Method for Diagnosis of Cardiovascular Diseases from the Printed ECG Pictures

Büyüksolak, Oğuzhan, Öksüz, İlkay

arXiv.org Artificial Intelligence

The electrocardiogram (ECG) is a vital tool for diagnosing heart diseases. However, many disease patterns are derived from outdated datasets and traditional stepwise algorithms with limited accuracy. This study presents a method for direct cardiovascular disease (CVD) diagnosis from ECG images, eliminating the need for digitization. The proposed approach utilizes a two-step curriculum learning framework, beginning with the pre-training of a classification model on segmentation masks, followed by fine-tuning on grayscale, inverted ECG images. Robustness is further enhanced through an ensemble of three models with averaged outputs, achieving an AUC of 0.9534 and an F1 score of 0.7801 on the BHF ECG Challenge dataset, outperforming individual models. By effectively handling real-world artifacts and simplifying the diagnostic process, this method offers a reliable solution for automated CVD diagnosis, particularly in resource-limited settings where printed or scanned ECG images are commonly used. Such an automated procedure enables rapid and accurate diagnosis, which is critical for timely intervention in CVD cases that often demand urgent care.


Masked Training for Robust Arrhythmia Detection from Digitalized Multiple Layout ECG Images

Zhang, Shanwei, Zhang, Deyun, Tao, Yirao, Wang, Kexin, Geng, Shijia, Li, Jun, Zhao, Qinghao, Liu, Xingpeng, Zhou, Yuxi, Hong, Shenda

arXiv.org Artificial Intelligence

Electrocardiogram (ECG) as an important tool for diagnosing cardiovascular diseases such as arrhythmia. Due to the differences in ECG layouts used by different hospitals, the digitized signals exhibit asynchronous lead time and partial blackout loss, which poses a serious challenge to existing models. To address this challenge, the study introduced PatchECG, a framework for adaptive variable block count missing representation learning based on a masking training strategy, which automatically focuses on key patches with collaborative dependencies between leads, thereby achieving key recognition of arrhythmia in ECGs with different layouts. Experiments were conducted on the PTB-XL dataset and 21388 asynchronous ECG images generated using ECG image kit tool, using the 23 Subclasses as labels. The proposed method demonstrated strong robustness under different layouts, with average Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.835 and remained stable (unchanged with layout changes). In external validation based on 400 real ECG images data from Chaoyang Hospital, the AUROC for atrial fibrillation diagnosis reached 0.778; On 12 x 1 layout ECGs, AUROC reaches 0.893. This result is superior to various classic interpolation and baseline methods, and compared to the current optimal large-scale pre-training model ECGFounder, it has improved by 0.111 and 0.19.


Beyond Single-Channel: Multichannel Signal Imaging for PPG-to-ECG Reconstruction with Vision Transformers

Li, Xiaoyan, Xu, Shixin, Habib, Faisal, Gupta, Arvind, Huang, Huaxiong

arXiv.org Artificial Intelligence

Reconstructing ECG from PPG is a promising yet challenging task. While recent advancements in generative models have significantly improved ECG reconstruction, accurately capturing fine-grained waveform features remains a key challenge. To address this, we propose a novel PPG-to-ECG reconstruction method that leverages a Vision Transformer (ViT) as the core network. Unlike conventional approaches that rely on single-channel PPG, our method employs a four-channel signal image representation, incorporating the original PPG, its first-order di ff erence, second-order di fference, and area under the curve. This multi-channel design enriches feature extraction by preserving both temporal and physiological variations within the PPG. Experimental results demonstrate that our method consistently outperforms existing 1D convolution-based approaches, achieving up to 29% reduction in PRD and 15% reduction in RMSE. The proposed approach also produces improvements in other evaluation metrics, highlighting its robustness and e ff ectiveness in reconstructing ECG signals. Furthermore, to ensure a clinically relevant evaluation, we introduce new performance metrics, including QRS area error, PR interval error, RT interval error, and RT amplitude di fference error. Beyond demonstrating the potential of PPG as a viable alternative for heart activity monitoring, our approach opens new avenues for cyclic signal analysis and prediction. Introduction Electrocardiograms (ECGs) are essential tools for diagnosing and monitoring cardiovascular health, providing crucial insights into heart rate variability (HRV), heart rate, and key waveform features. These include the QRS complex, PR interval, ST segment, TP interval, and QT interval, which are vital for understanding the heart's electrical activity and diagnosing various cardiac conditions [1, 2]. Prolonged PR intervals may indicate first-degree atrioventricular (A V) block or delayed conduction through the A V node, suggesting potential cardiac conduction issues [3]. Conversely, shortened PR intervals might imply conditions such as Wol ff-Parkinson-White (WPW) syndrome or Lown-Ganong-Levine syndrome, where accessory pathways bypass the normal A V nodal delay [3].


MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and Interpretations

Zhang, Deyun, Lan, Xiang, Geng, Shijia, Zhao, Qinghao, Fan, Sumei, Feng, Mengling, Hong, Shenda

arXiv.org Artificial Intelligence

Electrocardiogram (ECG) plays a foundational role in modern cardiovascular care, enabling non-invasive diagnosis of arrhythmias, myocardial ischemia, and conduction disorders. While machine learning has achieved expert-level performance in ECG interpretation, the development of clinically deployable multimodal AI systems remains constrained, primarily due to the lack of publicly available datasets that simultaneously incorporate raw signals, diagnostic images, and interpretation text. Most existing ECG datasets provide only single-modality data or, at most, dual modalities, making it difficult to build models that can understand and integrate diverse ECG information in real-world settings. To address this gap, we introduce MEETI (MIMIC-IV-Ext ECG-Text-Image), the first large-scale ECG dataset that synchronizes raw waveform data, high-resolution plotted images, and detailed textual interpretations generated by large language models. In addition, MEETI includes beat-level quantitative ECG parameters extracted from each lead, offering structured parameters that support fine-grained analysis and model interpretability. Each MEETI record is aligned across four components: (1) the raw ECG waveform, (2) the corresponding plotted image, (3) extracted feature parameters, and (4) detailed interpretation text. This alignment is achieved using consistent, unique identifiers. This unified structure supports transformer-based multimodal learning and supports fine-grained, interpretable reasoning about cardiac health. By bridging the gap between traditional signal analysis, image-based interpretation, and language-driven understanding, MEETI established a robust foundation for the next generation of explainable, multimodal cardiovascular AI. It offers the research community a comprehensive benchmark for developing and evaluating ECG-based AI systems.


Deep Learning-Based Digitization of Overlapping ECG Images with Open-Source Python Code

Karbasi, Reza, Rahimi, Masoud, Vahabie, Abdol-Hossein, Moradi, Hadi

arXiv.org Artificial Intelligence

--This paper addresses the persistent challenge of accurately digitizing paper-based electrocardiogram (ECG) recordings, with a particular focus on robustly handling single leads compromised by signal overlaps--a common yet under-addressed issue in existing methodologies. We propose a two-stage pipeline designed to overcome this limitation. The first stage employs a U-Net based segmentation network, trained on a dataset enriched with overlapping signals and fortified with custom data augmentations, to accurately isolate the primary ECG trace. The subsequent stage converts this refined binary mask into a time-series signal using established digitization techniques, enhanced by an adaptive grid detection module for improved versatility across different ECG formats and scales. Our experimental results demonstrate the efficacy of our approach. The U-Net architecture achieves an Intersection over Union (IoU) of 0.87 for the fine-grained segmentation task. Crucially, our proposed digitization method yields superior performance compared to a well-established baseline technique across both non-overlapping and challenging overlapping ECG samples. For non-overlapping signals, our method achieved a Mean Squared Error (MSE) of 0.0010 and a Pearson Correlation Coefficient ( ρ) of 0.9644, compared to 0.0015 and 0.9366, respectively, for the baseline. On samples with signal overlap, our method achieved an MSE of 0.0029 and a ρ of 0.9641, significantly improving upon the baseline's 0.0178 and 0.8676. This work demonstrates an effective strategy to significantly enhance digitization accuracy, especially in the presence of signal overlaps, thereby laying a strong foundation for the reliable conversion of analog ECG records into analyzable digital data for contemporary research and clinical applications. Electrocardiogram (ECG) serves as a cornerstone in the diagnosis and ongoing monitoring of cardiovascular diseases, which persist as a primary cause of mortality globally [1]. The ability to access and analyze ECG time-series data substantially enhances the efficacy of deep learning-based clinical decision support systems [2].


An Open-Source Python Framework and Synthetic ECG Image Datasets for Digitization, Lead and Lead Name Detection, and Overlapping Signal Segmentation

Rahimi, Masoud, Karbasi, Reza, Vahabie, Abdol-Hossein

arXiv.org Artificial Intelligence

We introduce an open-source Python framework for generating synthetic ECG image datasets to advance critical deep learning-based tasks in ECG analysis, including ECG digitization, lead region and lead name detection, and pixel-level waveform segmentation. Using the PTB-XL signal dataset, our proposed framework produces four open-access datasets: (1) ECG images in various lead configurations paired with time-series signals for ECG digitization, (2) ECG images annotated with YOLO-format bounding boxes for detection of lead region and lead name, (3)-(4) cropped single-lead images with segmentation masks compatible with U-Net-based models in normal and overlapping versions. In the overlapping case, waveforms from neighboring leads are superimposed onto the target lead image, while the segmentation masks remain clean. The open-source Python framework and datasets are publicly available at https://github.com/rezakarbasi/ecg-image-and-signal-dataset and https://doi.org/10.5281/zenodo.15484519, respectively.


GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images

Lan, Xiang, Wu, Feng, He, Kai, Zhao, Qinghao, Hong, Shenda, Feng, Mengling

arXiv.org Artificial Intelligence

While recent multimodal large language models (MLLMs) have advanced automated ECG interpretation, they still face two key limitations: (1) insufficient multimodal synergy between time series signals and visual ECG representations, and (2) limited explainability in linking diagnoses to granular waveform evidence. We introduce GEM, the first MLLM unifying ECG time series, 12-lead ECG images and text for grounded and clinician-aligned ECG interpretation. GEM enables feature-grounded analysis, evidence-driven reasoning, and a clinician-like diagnostic process through three core innovations: a dual-encoder framework extracting complementary time series and image features, cross-modal alignment for effective multimodal understanding, and knowledge-guided instruction generation for generating high-granularity grounding data (ECG-Grounding) linking diagnoses to measurable parameters ($e.g.$, QRS/PR Intervals). Additionally, we propose the Grounded ECG Understanding task, a clinically motivated benchmark designed to comprehensively assess the MLLM's capability in grounded ECG understanding. Experimental results on both existing and our proposed benchmarks show GEM significantly improves predictive performance (CSN $7.4\% \uparrow$), explainability ($22.7\% \uparrow$), and grounding ($24.8\% \uparrow$), making it more suitable for real-world clinical applications. GitHub repository: https://github.com/lanxiang1017/GEM.git


Teach Multimodal LLMs to Comprehend Electrocardiographic Images

Liu, Ruoqi, Bai, Yuelin, Yue, Xiang, Zhang, Ping

arXiv.org Artificial Intelligence

The electrocardiogram (ECG) is an essential non-invasive diagnostic tool for assessing cardiac conditions. Existing automatic interpretation methods suffer from limited generalizability, focusing on a narrow range of cardiac conditions, and typically depend on raw physiological signals, which may not be readily available in resource-limited settings where only printed or digital ECG images are accessible. Recent advancements in multimodal large language models (MLLMs) present promising opportunities for addressing these challenges. However, the application of MLLMs to ECG image interpretation remains challenging due to the lack of instruction tuning datasets and well-established ECG image benchmarks for quantitative evaluation. To address these challenges, we introduce ECGInstruct, a comprehensive ECG image instruction tuning dataset of over one million samples, covering a wide range of ECG-related tasks from diverse data sources. Using ECGInstruct, we develop PULSE, an MLLM tailored for ECG image comprehension. In addition, we curate ECGBench, a new evaluation benchmark covering four key ECG image interpretation tasks across nine different datasets. Our experiments show that PULSE sets a new state-of-the-art, outperforming general MLLMs with an average accuracy improvement of 15% to 30%. This work highlights the potential of PULSE to enhance ECG interpretation in clinical practice.


ECG-Image-Database: A Dataset of ECG Images with Real-World Imaging and Scanning Artifacts; A Foundation for Computerized ECG Image Digitization and Analysis

Reyna, Matthew A., Deepanshi, null, Weigle, James, Koscova, Zuzana, Campbell, Kiersten, Shivashankara, Kshama Kodthalu, Saghafi, Soheil, Nikookar, Sepideh, Motie-Shirazi, Mohsen, Kiarashi, Yashar, Seyedi, Salman, Clifford, Gari D., Sameni, Reza

arXiv.org Artificial Intelligence

We introduce the ECG-Image-Database, a large and diverse collection of electrocardiogram (ECG) images generated from ECG time-series data, with real-world scanning, imaging, and physical artifacts. We used ECG-Image-Kit, an open-source Python toolkit, to generate realistic images of 12-lead ECG printouts from raw ECG time-series. The images include realistic distortions such as noise, wrinkles, stains, and perspective shifts, generated both digitally and physically. The toolkit was applied to 977 12-lead ECG records from the PTB-XL database and 1,000 from Emory Healthcare to create high-fidelity synthetic ECG images. These unique images were subjected to both programmatic distortions using ECG-Image-Kit and physical effects like soaking, staining, and mold growth, followed by scanning and photography under various lighting conditions to create real-world artifacts. The resulting dataset includes 35,595 software-labeled ECG images with a wide range of imaging artifacts and distortions. The dataset provides ground truth time-series data alongside the images, offering a reference for developing machine and deep learning models for ECG digitization and classification. The images vary in quality, from clear scans of clean papers to noisy photographs of degraded papers, enabling the development of more generalizable digitization algorithms. ECG-Image-Database addresses a critical need for digitizing paper-based and non-digital ECGs for computerized analysis, providing a foundation for developing robust machine and deep learning models capable of converting ECG images into time-series. The dataset aims to serve as a reference for ECG digitization and computerized annotation efforts. ECG-Image-Database was used in the PhysioNet Challenge 2024 on ECG image digitization and classification.


CNN Based Detection of Cardiovascular Diseases from ECG Images

Sayin, Irem, Gursoy, Rana, Cicek, Buse, Mert, Yunus Emre, Ozturk, Fatih, Pamukcu, Taha Emre, Sevimli, Ceylin Deniz, Uvet, Huseyin

arXiv.org Artificial Intelligence

This study develops a Convolutional Neural Network (CNN) model for detecting myocardial infarction (MI) from Electrocardiogram (ECG) images. The model, built using the InceptionV3 architecture and optimized through transfer learning, was trained using ECG data obtained from the Ch. Pervaiz Elahi Institute of Cardiology in Pakistan. The dataset includes ECG images representing four different cardiac conditions: myocardial infarction, abnormal heartbeat, history of myocardial infarction, and normal heart activity. The developed model successfully detects MI and other cardiovascular conditions with an accuracy of 93.27%. This study demonstrates that deep learning-based models can provide significant support to clinicians in the early detection and prevention of heart attacks.